ABSTRACT
Evaluating the performance of large compute clusters requires benchmarks with representative workloads. At Google, performance benchmarks are used to obtain performance metrics such as task scheduling delays and machine resource utilizations to assess changes in application codes, machine configurations, and scheduling algorithms. Existing approaches to workload characterization for high performance computing and grids focus on task resource requirements for CPU, memory, disk, I/O, network, etc. Such resource requirements address how much resource is consumed by a task. However, in addition to resource requirements, Google workloads commonly include task placement constraints that determine which machine resources are consumed by tasks. Task placement constraints arise because of task dependencies such as those related to hardware architecture and kernel version.
This paper develops methodologies for incorporating task placement constraints and machine properties into performance benchmarks of large compute clusters. Our studies of Google compute clusters show that constraints increase average task scheduling delays by a factor of 2 to 6, which often results in tens of minutes of additional task wait time. To understand why, we extend the concept of resource utilization to include constraints by introducing a new metric, the Utilization Multiplier (UM). UM is the ratio of the resource utilization seen by tasks with a constraint to the average utilization of the resource. UM provides a simple model of the performance impact of constraints in that task scheduling delays increase with UM. Last, we describe how to synthesize representative task constraints and machine properties, and how to incorporate this synthesis into existing performance benchmarks. Using synthetic task constraints and machine properties generated by our methodology, we accurately reproduce performance metrics for benchmarks of Google compute clusters with a discrepancy of only 13% in task scheduling delay and 5% in resource utilization.
- M. F. Arlitt and C. L. Williamson. Web server workload characterization: the search for invariants. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1996. Google ScholarDigital Library
- P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 1998. Google ScholarDigital Library
- J. Brevik, D. Nurmi, and R. Wolski. Predicting bounds on queuing delay for batch-scheduled parallel machines. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 2006. Google ScholarDigital Library
- M. Calzarossa and D. Ferrari. A sensitivity study of the clustering approach to workload modeling. SIGMETRICS Performance Evaluation Review, 1986. Google ScholarDigital Library
- S. J. Chapin, W. Cirne, D. G. Feitelson, J. P. Jones, S. T. Leutenegger, U. Schwiegelshohn, W. Smith, and D. Talby. Benchmarks and standards for the evaluation of parallel job schedulers. In Proceedings of the Job Scheduling Strategies for Parallel Processing, 1999. Google ScholarDigital Library
- Y. Chen, A. S. Ganapathi, R. Griffith, and R. H. Katz. Analysis and lessons from a publicly available google cluster trace. Technical Report UCB/EECS-2010-95, UC Berkeley, 2010.Google Scholar
- Y. Chen, A. S. Ganapathi, R. Griffith, and R. H. Katz. A methodology for understanding mapreduce performance under diverse workloads. Technical Report UCB/EECS-2010-135, UC Berkeley, 2010.Google Scholar
- Y. Chen, A. S. Ganapathi, R. Griffith, and R. H. Katz. Towards understanding cloud performance tradeoffs using statistical workload analysis and replay. Technical Report UCB/EECS-2010-81, UC Berkeley, 2010.Google Scholar
- W. Cirne and F. Berman. A comprehensive model of the supercomputer workload. In Proceedings of the IEEE Workload Characterization Workshop, 2001. Google ScholarDigital Library
- Condor Project. University of Wisconsin, http://www.cs.wisc.edu/condor.Google Scholar
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 2008. Google ScholarDigital Library
- Y. Denneulin, E. Romagnoli, and D. Trystram. A synthetic workload generator for cluster computing. Parallel and Distributed Processing Symposium, 2004.Google ScholarCross Ref
- D. Ersoz, M. S. Yousif, and C. R. Das. Characterizing network traffic in a cluster-based, multi-tier data center. Distributed Computing Systems, International Conference on, 2007. Google ScholarDigital Library
- I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 1996.Google Scholar
- A. S. Ganapathi, Y. Chen, A. Fox, R. H. Katz, and D. A. Patterson. Statistics-driven workload modeling for the cloud. Technical Report UCB/EECS-2009-160, UC Berkeley, 2009.Google Scholar
- J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 1979.Google Scholar
- IBM. IBM LoadLeveler. http://www.redbooks.ibm.com/redbooks/pdfs/sg246038.pdf.Google Scholar
- S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An analysis of traces from a production mapreduce cluster. IEEE Cluster Computing and the Grid, 2010. Google ScholarDigital Library
- K. Kim, K. Jeon, H. Han, S.-g. Kim, H. Jung, and H. Y. Yeom. Mrbench: A benchmark for mapreduce framework. In Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems, 2008. Google ScholarDigital Library
- L. Kleinrock. Queueing Systems. Wiley-Interscience, 2nd edition, 1975.Google Scholar
- A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller. Measurement and analysis of large-scale network file system workloads. In USENIX Annual Technical Conference, 2008. Google ScholarDigital Library
- U. Lublin and D. G. Feitelson. The workload on parallel supercomputers: modeling the characteristics of rigid jobs. Journal of Parallel Distributed Computing, 2003. Google ScholarDigital Library
- A. K. Mishra, J. L. Hellerstein, W. Cirne, and C. R. Das. Towards characterizing cloud backend workloads: insights from google compute clusters. SIGMETRICS Performance Evaluation Review, 2009. Google ScholarDigital Library
- D. Nurmi, J. Brevik, and R. Wolski. Qbets: Queue bounds estimation from time series. In In Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP), 2007. Google ScholarDigital Library
- R. Raman, M. Livny, and M. Solomon. Matchmaking: Distributed resource management for high throughput computing. In Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998. Google ScholarDigital Library
- B. Schroeder, E. Pinheiro, and W.-D. Weber. Dram errors in the wild: a large-scale field study. In Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, SIGMETRICS, 2009. Google ScholarDigital Library
- B. Sharma, V. Chudnovsky, H. J. L., R. Rifaat, and C. R. Das. Modeling and synthesizing task placement constraints in google compute clusters. Technical Report CSE#11-005, CSE Dept., Pennsylvania State University, 2011.Google ScholarDigital Library
- W. Smith, V. Taylor, and I. Foster. Using run-time predictions to estimate queue wait times and improve scheduler performance. In Scheduling Strategies for Parallel Processing. Springer-Verlag, 1999. Google ScholarDigital Library
- W. Sobel, S. Subramanyam, A. Sucharitakul, J. Nguyen, H. Wong, A. Klepchukov, S. Patil, O. Fox, and D. Patterson. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0, 2008.Google Scholar
- D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the condor experience. 1975.Google Scholar
- S. Zhou, X. Zheng, J. Wang, and P. Delisle. Utopia: a load sharing facility for large, heterogeneous distributed computer systems. Softw. Pract. Exper., 1993. Google ScholarDigital Library
Index Terms
- Modeling and synthesizing task placement constraints in Google compute clusters
Recommendations
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance ComputingPerformance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance ComputingPerformance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
A Benchmark Characterization of the EEMBC Benchmark Suite
Benchmark consumers expect benchmark suites to be complete, accurate, and consistent, and benchmark scores serve as relative measures of performance. However, it is important to understand how benchmarks stress the processors that they aim to test. This ...
Comments